Alibaba's latest CosyVoice speech synthesis model and SenseVoice speech recognition model together form the FunAudioLLM framework, aimed at enhancing human-computer interaction experience. CosyVoice, with its realistic voice generation capability, can mimic voices of different genders, ages, and personalities, adding emotions and styles while even simulating natural features such as laughter, coughing, and breathing. SenseVoice focuses on high-precision multilingual speech recognition, emotion detection, and audio event detection, supporting over 50 languages.